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Abstract. We address aggregate queries over GIS data and moving ob- 
ject data, where non-spatial data are stored in a data warehouse. We 
propose a formal data model and query language to express complex 
aggregate queries. Next, we study the compression of trajectory data, 
produced by moving objects, using the notions of stops and moves. We 
show that stops and moves are expressible in our query language and we 
consider a fragment of this language, consisting of regular expressions to 
talk about temporally ordered sequences of stops and moves. This frag- 
ment can be used to efficiently express data mining and pattern matching 
tasks over trajectory data. 

1 Introduction 

Geographic Information Systems (GIS) have been extensively used in various ap- 
plication domains, ranging from economical, ecological and demographic anal- 
ysis, to city and route planning [17,23]. In recent years, time is playing an 
increasingly important role in in GIS and spatial data management [14]. One 
particular Hne of research in this direction, introduced by Wolfson [4,5,12,21, 
22,19], concerns moving object data. Moving objects, carrying location-aware 
devices, produce trajectory data in the form of a sample of {Oid, t, x, y)-tuples, 
that contain object identifier and time-space information. 

In this paper, we are interested in aggregate queries over GIS data and mov- 
ing object data. Typically, when aggregation becomes important, it is advisable 
to organize the non-spatial data in a GIS in a data warehouse. In a data ware- 
house, numerical data are stored in fact tables built along several dimensions. 
For instance, if we are interested in the sales of certain products in stores in a 
given region, we may consider the sales amounts in a fact table over the three 
dimensions store, time and product. In general dimensions are organized into 
aggregation hierarchies. For example, stores can aggregate over cities which in 
turn can aggregate into regions and countries. Each of these aggregation levels 
can also hold descriptive attributes like city population, the area of a region, etc. 
For traditional alpha-numeric data, OLAP (On Line Analytical Processing) [10] 
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comprises a set of tools and algorithms that allow efficiently querying multi- 
dimensional databases, containing large amounts of data, usually called data 
warehouses. 

Two of the present authors have proposed in previous work a model for 
smoothly integrating the GIS and OLAP worlds. This model was implemented 
using open source software [6]. The same authors also proposed a taxonomy 
of aggregation queries on moving object data [11]. In this paper, we propose a 
conceptual model and a formal query language that cover the different types of 
aggregation queries discussed in the above mentioned taxonomy (see Sections 2 
and 3). At the basis of our aggregation query language is a multi-sorted first- 
order query language Cmo for moving object and GIS data in which one can 
specify properties of moving objects, geometric elements of GIS layers and OLAP 
data storing the non-spatial GIS data. 

Recently, in the study of moving object data, the concepts of stops and 
moves were introduced [13,2]. These concepts serve to compress the trajectory 
data that is produced by moving objects using application dependent places 
of interest. A designer may want to select a set of places of interest that are 
relevant to her application. For instance, in a tourist application, such places 
can be hotels, museums and churches. In a traffic control application, they may 
be road segments, traffic lights and junctions. We assume that these places of 
interest arc stored in a specific GIS layer. If a moving object spends a sufficient 
amount of time in a place of interest, this place is considered a stop of the object's 
trajectory. In between stops, the trajectory has moves. Thus, we can replace a 
raw trajectory given by {Oid, t, x, ?/)-tuples by a sequence of application-relevant 
stops and moves. In this paper, we give a geometric definition of stops and moves 
and show that they are computable (see Section 4). We also show that this 
compression can be expressed in the language Cmo and we sketch a sublanguage 
of Cmo that allows us to talk about temporally ordered sequences of stops and 
moves (see Section 5). The syntax of this languages is given in the form of 
regular expressions (sec Section 6). We show that this language considerably 
extends the language proposed by Mouza and Rigaux [13], and can be used to 
efficiently express data mining and pattern matching tasks over trajectory data. 

1.1 Running Example 

Now, let us introduce the example we will be using throughout the paper. Figure 
1 (left) shows a simplified map of Paris, containing two hotels, denoted Hotel 
1 and Hotel 2 (HI and H2 from here on), the Louvre and the Eiffel tower. We 
consider three moving objects, 01, 02 and 03. Object 01 goes from HI to the 
Louvre, the Eiffel tower, spends just a few minutes there, and returns to the 
hotel. Object 02 goes from HI to the Louvre, the Eiffel tower (it stays a couple 
of hours in each place), and returns to the hotel. Object 03 leaves H2 to the 
Eiffel tower, visits the place, and returns to H2. Figure 1 shows an example of 
these trajectory samples on the right. 

In this scenario, a GIS user may be interested in finding out useful trajectory 
information in this setting, like "number of persons going from HI to the Louvre 



Fig. 1. Running example (left) and a moving object fact table (right) 



and the Eiffel tower (visiting both places) in the same day", or "number of 
persons going from a hotel in the left bank of the Seine, to the Louvre in the 
mornings" . 

1.2 Related Work 

GIS and OLAP Interaction Although some authors have pointed out the benefits 
of combining GIS and OLAP, not much work has been done in this field. Vega 
Lopez et al [20] present a comprehensive survey on spatiotemporal aggregation 
that includes a section on spatial aggregation. Rivest et al. [18] introduce the 
concept of SOLAP (standing for Spatial OLAP), and describe the desirable 
features and operators a SOLAP system should have. Han et al. [7] used OLAP 
techniques for materializing selected spatial objects, and proposed a so-called 
Spatial Data Cube. This model only supports aggregation of such spatial objects. 

Moving Objects Many efforts have been made in the field of moving objects 
databases, specially regarding data modeling an indexing. Giiting and Schnei- 
der [5] provide a good reference to this large corpus of work. Giiting et al pro- 
posed a system of abstract data types as extensions to DBMSs to support timc- 
dcpcndant geometries [4] . Hornsby and Egcnhofcr [8] introduced a framework for 
modeling moving objects, that supports viewing objects at different granulari- 
ties, depending on the sampling time interval. The possible positions of an object 
between two observation is estimated to be within two inverted half-cones that 
conform a lifeline head, whose projection over the x-y plane is an ellipse. Another 
approach to moving objects studies moving objects on networks, basically repre- 
sented as graphs. Van de Weghe et al proposed a qualitative trajectory calculus 
for objects in a GIS [3], based on the assumption that in a GIS scenario, qual- 
itative information is necessary (and, in general, more useful than quantitative 
information) . 

Aggregate information is still quite an open field, either in GIS or in a moving 
objects scenario. Meratnia and de By [12] have tackled the topic of aggregation 
of trajectories, identifying similar trajectories and merging them in a single one, 
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by dividing the area of study into homogeneous spatial units. Papadias et al [15] 
index historical aggregate information about moving objects. They aim at build- 
ing a spatio-temporal data warehouse 

Regarding the addition of semantics to trajectories, Brakatsoulas et al[\], in 
the context of trajectory mining in road networks, propose to enrich trajectories 
of moving objects with information about the relationships between trajectories 
(e.g., intersect, meets), and between a trajectory and the GIS environment {stay 
within, bypass, leave). Extending this notion, Damiani et al [2] introduced the 
concept of stops and moves, in order to enrich trajectories with semantically 
annotated data. With a similar idea, [13] propose a model where trajectories are 
represented by a sequence of moves. They propose a query language based on 
regular expressions, aimed at obtaining so-called mobility patterns. However, this 
language is only geared towards trajectory data, and does not relate trajectories 
with the GIS environment. Thus, the classes of queries addressed is limited. 
Moreover, aggregation is not considered in this language. 

We can conclude that, although the efforts above address particular problems, 
integrating spatial and warehousing information in a single framework is still in 
its infancy. 

2 A Data Model for Moving Objects 

Our work is based on the data model introduced in [6, 11]. In this section we 
give an overview of this model. We first present the model for spatial data, and 
then we introduce the notion of moving objects. 



2.1 Spatial Data 

A GIS dimension is considered, as usual in databases, as composed of a schema 
and instances. Figure 2 (left) depicts the schema of a GIS dimension: the bottom 
level of each hierarchy, denoted the Algebraic part of the dimension, contains the 
infinite points in a layer, and could be described by means of linear algebraic 
equalities and inequalities [16]. Above this part there is the Geometric part, that 
stores the identifiers of the geometric elements of GIS and is used to solve the 
geometric part of a query (i.e. find the polylines -implemented as linestrings- in 
a river representation) . Each point in the Algebraic part may correspond to one 
or more elements in the Geometric part. Thus, at the GIS dimension instance 
level we will have roUup relations (denoted j.9^°"^^^9eom2 ^ These relations map, 
for example, points in the Algebraic part, to geometry identifiers in the Geomet- 
ric part For example, r^''^'^^^^ {x,y,pgi) says that point {x,y) corresponds to 
a polygon identified by pgi in the Geometric part (note that a point may cor- 
respond to more than one polygon, o to more than one polylines that intersect 
with each other). 

Finally, there is the CLAP part of the dimension. This part contains the 
conventional OLAP structures, as defined in [9]. The levels in the geometric 
part are associated to the OLAP part via a function, denoted Q,^™-^'='"'='-*£'e°'" 
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Fig. 2. A GIS dimension schema (left) and A GIS dimension instance (right) 



For instance, Q^"'''fiiut;r*/'^ associates information about a river in the OLAP part 
{river Id), to the identifier of a polyhne (g^) in a layer containing rivers {Lr) in 
the Geometric part. 



Example 1. Figure 2 (left) shows a GIS dimension schema, where we defined 
three layers, for rivers, cities, and provinces, respectively. The schema is com- 
posed of three graphs; the graph for rivers contains edges saying that a point 
(x, y) in the algebraic part relates to a line identifier in the geometric part, and 
that in the same portion of the dimension, this line aggregates on a polyline 
identifier. 

In the OLAP part we have information given by two dimensions, representing 
districts and rivers, associated to the corresponding graphs, as the figure shows. 
For example, a river identifier at the bottom layer of the Rivers dimension rep- 
resenting rivers in the OLAP part, is mapped to the polyline dimension level in 
the geometric part in the graph in the rivers layer L^- 

Figure 2 (right) shows a portion of a GIS dimension instance of the rivers 
layer Lr in the dimension schema of the schema in the left of the figure. Here, an 
instance of a GIS dimension in the OLAP part is associated to the polyline pli, 
which corresponds to the Seine river. For simplicity we only show four different 
points at the point level {(xi, j/i), . . . , (0:4, 2/4)}. There is a relation 
containing the association of points to the lines in the line level, and a relation 
^hne, poly line ^ bctwccn thc liuc and polyline levels, in the same layer. □ 

Elements in the geometric part can be associated with facts, each fact be- 
ing quantified by one or more measures, not necessarily a numeric value. Of 
course, besides the GIS fact tables, there may also be classical fact tables in the 
OLAP part, defined in terms of the OLAP dimension schemas. For instance, we 
could either store the population associated to a polygon identifier, or in a data 
warehouse fact table, with schema {state, Year, Population). 
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2.2 Moving Object Data Representation 

Besides the static information representing geometric components (i.e., the GIS), 
for representing time in the OLAP part there wiU be a Time dimension (actuaUy, 
there could be more than one Time dimensions, supporting, for example, different 
notions of time). Also, as it is well-known in OLAP, this time dimension can have 
different configurations that depend on the application at hand. Moving objects 
are integrated in the former framework using a distinguished fact table denoted 
Moving Object Fact Table (MOFT). 

First, we say what a trajectory is. In practice, trajectories are available by a 
finite sample of {ti,Xi,yi) points, obtained by observation. 

Definition 1 (Trajectory). A trajectory is a list of time-space points {{to,xo, 
yo),{ti,xi,yi),...,{tN,XN,yN)), where ti,Xi,yi G R for i = 0, ...,7V and to < 
ti < ■ ■ ■ < t]^. We call the interval [to, t^] the time domain of the trajectory. □ 

For the sake of finite representability, we may assume that the time-space 
points {ti, Xi,yi), have rational coordinates. 

A moving object fact table (MOFT for short, see the table in the right hand 
side of Figure I contains a finite number of identified trajectories. Formally: 

Definition 2 (Moving Object Fact Table). Given a finite set T of trajec- 
tories, a Moving Object Fact Table (MOFT) for T is a relation with schema 
< Did, T,X,Y >, where Did is the identifier of the moving object, T represents 
time instants, and X and Y represent the spatial coordinates of the objects. An 
instance Ai of the above schema contains a finite number of tuples of the form 
{Oi,t,x,y), that represent the position {x,y) of the object Oi at instant t, for 
the trajectories in T. □ 

3 A Query Language for Aggregation of Moving Object 
Data 

The aggregation queries that we address in this paper are based on a first-order 
moving object query language Cmo and they are of the following types: 

- the Count operator applied to sets of the form {Did \ 4){0id)}^ where moving 
objects identifiers satisfying some £„io-definable property (j) are collected; 

- the Count operator applied to sets of the form {(0^^,^) | </'(Oid,i)}, where 
moving objects identifiers combined with time moments, satisfying some 
jCmo-definable property </>, are collected (assuming that this set is finite; 
otherwise the count is undefined); 

- the Count operator apphed to sets of the form {{Did, t, x, y) \ (f>{Oid, t, x, y)}, 
where moving objects id's combined with time and space coordinates, satis- 
fying some £mo-definable property (j), are collected (assuming that this set 
is finite); 
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- the Area operator applied to sets of the form {(x, y) e | y)}, which 
define some £„io-definable part of the plane (assuming that this set is 
linear and bounded); 

- the Count, Max and Min operators apphed to sets of the form {i e R | 
<p{t)}, when the £mo-definable condition (j) defines a finite set of time instants 
and the TimeSpan operator when (p defines an infinite, but bounded set of 
time instants (the semantics of Count, Max and Min is clear and TimeS- 
pan returns the difference between the maximal and minimal moments in 
the set); 

- the Max-l, Min-l, Avg-l and TimeSpan-l operators applied to sets of the 
form {{ts,tf) e R^ I (j){ts,tf)}, which represents an £mo-definable set of 
time intervals. The meaning of these operators is respectively the maximum, 
minimum and average lengths of the intervals if there is a finite number of 
intervals and the timespan of the union of these intervals in the last case; 

- the Area operator applied to sets of the form {gid \ (f){gid)}, where identifiers 
of elements of some geometry (in the geometric part of our data model), 
satisfying an £„io-definable (j) are collected. The meaning of this operator 
is the total area covered by the geometric elements corresponding to the 
identifiers. 

Obviously, the above list is not complete, but is covers the most interesting 
and usual cases (see [11] for an extensive list of examples of moving object aggre- 
gation queries). For instance, sets like {{t,x) £ R^ | <j){t,x)} do not correspond 
to any obvious entity we would like to aggregate over. 

To complete the description of our moving-object aggregation language, the 
query language Cmo remains to be defined. In the £,„o-definable sets considered 
above, we can sec that there are variables of different kinds, like Oid,t,x,y and 
gid- In fact, Cmo is a multi-sorted first-order logic using variables of these types 
to define sets as considered above. We now define £-mo niore formally. 

Definition 3. The first-order query language Cmo has four types of variables: 
real variables x,y,t,...; name variables Oid,---] geometric identifier variables 
gid, ... and dimension level variables a,b,c, (which arc also use for dimension 
level attributes). 

Besides (existential and universal) quantification over all these variables, and 
the usual logical connectives A, V, we consider the following fimctions and 
relations to build atomic formulas in Cmo- 

- for every roUup function in the OLAP part, we have a function symbol 
fDl~^^' , where Gi and Gj are geometries and is a dimension; 

- analogously, for every roUup relation in the GIS part, we have a relation 
symbol f^^^**^^, where Gi and Gj are geometries and Lk is a layer; 

- for every a relation associating the OLAP and GIS parts in some layer Li, 
we have a relation symbol ' , where Ai is a OLAP dimension level and 
Gj is a geometry, Lk is a layer and is a dimension; 
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- for every dimension level A, and every attribute B of A, denoted A.B, there 
is a function P^^^ that maps elements of A to elements of B in dimension 
Dk; 

- we have functions, relations and constants that can be applied to the alpha- 
numeric data in the OLAP part (e.g., we have the £ relation to say that an 
element belongs to a dimension level, we may have < on income values and 
the function concat on string values); 

- for every MOFT, we have a 4-ary relation Mi; 

- we have arithmetic operations + and x, the constants and 1. and the 
relation < for real numbers. 

- finally, we assume the equality relation for all types of variables. 

If needed, we may also assume other constants, e.g., for object identifiers. □ 

Definition 3 describes the syntax of the language Cmo- The interpretation of 
all variables, functions, relation, and constants is standard, as well as that of the 
logical connectives and quantifiers. We don't define the semantics formally but 
illustrate through an elaborate example. 

Example 2. Let us consider the query "Give the total number of buses per hour in 
the morning in the Paris districts with a monthly income of less than € 1500,00. " 

We use the MOFT A4 (Figure 1, left), that contains the moving objects 
samples. For clarity, we will denote the geometry polygons by Pg, polylines 
by PI and point by Pt. We use distr to denote the level district in the OLAP 
part of the dimension schema. The GIS layer which contains district information 
is called Ld- As in the above definition, we assume that the layers to which a 
function refers are implicit by the function's name. For instance, in the expression 
•^Ld *Dz^tr ('^) ~ Pg ' ^^'^ district variable n is mapped to the polygon with variable 
name Pg that is in the layer Ld, indicated by the function Oi'^^^jj^Jr (here Distr 
is a dimension in the OLAP part representing districts). Thus, the result of the 
query returning the region with the required income is expressed as: 

{ix,y) I 3n35i(rrr''^(x,y,5i) Aat:,S(«) =31 A/5g5^^^^^^ < 1.500)}. 

In this expression. 7'^*^^^(x, j/, gi) relates points to polygons in the district 
layer; the function Q:^'**^'j^f^(n) = gi maps the district identifier n in the OLAP 
part to the geometry identifier gi ; and P'^flf^'^'^^"™^ (n) maps the district identi- 
fier n to the value of the income attribute which then is compared by an OLAP 
relation < with a OLAP constant 1.500. 

The instants corresponding to the morning hours mentioned in the fact tables 
are obtained through the roUup functions in the Time dimension. We assume 
in the Time dimension a category denoted timeOfDay, rolling up to the di- 
mension category hour (i.e., timeOfDay — » hour). The aggregation of the 
values in the fact table corresponding only to morning hours is computed with 
the following expression: Mmommg = {{Oid,t,x,y) \ f^^OfOay^hour ^ 
"Morning" A Ai{Oid, t, x, y)}. In this formula "Morning" appears as a constant 
related to the OLAP part. Finally, the query we discuss reads: 



9 




morning 



{Oid,t,x,y) A 



(n) < 1,500)}. 



If we would change the given aggregation query to "Give the total number of 
buses per hour in the morning within 3 km from a Paris district with a monthly 
income of less than € 1500,00." then we would need +, x and < to express 
the distance constraint. This would introduce a quadratic polynomial in the 
formula to express that some points are less than 3 km apart. This concludes 
the example. □ 

Proposition 1. Moving object queries expressible in Cst are computable. The 
proposed aggregation operators are also computable. 

Proof. (Sketch) The semantics of Cst expressions is straightforward apart from 
the subexpressions that involve +, x and < on real numbers and quantification 
over real numbers. These subexpressions belong to the formalism of constraint 
databases and they can be evaluated by quantifier elimination techniques [16]. 

The restrictions that we imposed on the applicability of the aggregation 
operators make sure that they can be effectively evaluated. In particular, the 
area of a set {(x,y) G | </)(a;,y)} is computable when this set is semi-linear 
and bounded. This area can be obtained by triangulating such linear sets and 
adding the areas of the triangles. □ 

4 Stops and Moves of Trajectories 

In this section, we define what the stops and moves of a trajectory are. In a 
GIS scenario, this definition is dependent on the particular places of interest in 
a particular application. For instance, in a tourist application, places of interest 
may be hotels, museums and churches. In a traffic application, places of interest 
may be road segments, road junctions and traffic lights. First, wc define the 
notion of "places of interest of an application" (PIA). 

Definition 4. [Places of Interest] Aplace of interest (Pol) Cisatuple {Re, Ac), 
where Rc is a (topologically closed) polygon, polyline or point in and Ac is 
a strictly positive real number. The set Rc is called the geometry of the Pol C 
and Ac is called its minimum duration. 

The places of interest of an application (PIA) T' is a finite collection of Pols 
with mutually disjoint geometries. □ 

Definition 5. [Stops and moves of a trajectory] Let T = {{to,xo,yo),{ti,xi, 
yi), ■■■,(tn,Xn,yn)) be a trajectory and let V = {Ci = {Rc^, Ac^), ...,Cn = 
(i?c„,^c„)} be a PIA. 

A stop of T with respect to V is a maximal contiguous subtrajectory ((t^, Xi, 
yi), {t^+i,Xi+i,yi+i), {ti+i,Xi+i,yi+i)) of T such that for some k € {1, ...,N} 
the following holds: (a) {xt+j,yt+j) e Rc^ for j = 0, 1, £; (b) ti+e -t^ > Ac^. 



Rci Rca 

Fig. 3. An example of a trajectory with two stops and tliree moves. 

A move of T with respect to V is: (a) a maximal contiguous subtrajectory 
of T in between two temporally consecutive stops of T; (b) maximal contiguous 
subtrajectory of T in between the starting point of T and the first stop of T; 
(c) a maximal contiguous subtrajectory of T in between the last stop of T and 
ending point of T; (d) the trajectory T itself, if T has no stops. □ 

Figure 3 illustrates these concepts. In this example, there are four places of 
interest with geometries Rc^ , Rc2 ? ^nd ■ The trajectory T is depicted 
here by linearly interpolating between its sample points, to indicate their order. 
Let us imagine that T is run through from left to right. If the three sample 
points in Rc^ are temporally far enough apart (longer than ^Ci)i they form a 
stop. Imagine that further on, only the two sample points in Rc^ are temporally 
far enough apart to form a stop. Then we have two stops in this example and 
three moves. 

We remark that our definition of stops and moves of a trajectory is arbitrary 
and can be modified in many ways. For example, if we would work with linear 
interpolation of trajectory samples, rather than with samples, wc sec in Figure 3, 
that the trajectory briefly leaves Rc^ (not in a sample point, but in the interpo- 
lation). We could incorporate a tolerance for this kind of small exists from Pols 
in the definition, if we define stops and moves in terms of continuous trajectories, 
rather than on terms of samples. The following property is straightforward. 

Proposition 2. There is an algorithm that returns, for any input {V, T) with 
V a PIA and T a trajectory ((io, a^o, 2/o), (ii, a;i, j/i), (i„, x„, y„)}, the stops 
of T with respect to V . This algorithm works in time 0{n -p), where p is the 
complexity of answering the point-query [17]. □ 

5 A Stops and Move Fact Table 

Let the places of interest (Pols) of an application (PIA) be given. In this sec- 
tion, we describe how we go from MOFTs to application dependent compressed 
MOFTS, where {Oid,ti,Xi,yi) tuples are replaced by {Oid, gid,ts,tf) tuples. In 
the latter tuples, Oid is a moving object identifier, gid is an identifier of the 
geometry of a place of interest and ts and tf are two time moments that encode 
the time interval [ts,tf] of a stop. The idea is to replace the trajectories in a 
MOFT that are stored there as samples, by a stops MOFT that represents the 
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same trajectory more concisely by listing its stops and the time intervals spent 
in the stops. 

In our model, application information about the Pols is stored in the OLAP 
part as OLAP dimensions. For example, if hotels are places of interest, we will 
need to create a dimension Hotels such that its bottom level contains the identi- 
fier for the hotels and some hierarchy that is specific for hotels, e.g., a hotel may 
belong to the 3-star category. Given that these dimensions depend on a particu- 
lar application, we define, at the conceptual level, a Generic virtual dimension, 
from which different dimensions can be generated. 

To start with, we assume that the places of interest are stored in moving 
object OLAP in the following way: the geometries of the Pols are represented 
in a layer in the GIS denoted Lpoi (e.g., a layer containing polygons that repre- 
sent hotels or a layer containing polylines that represent street segments) . Data 
describing the places of interest is stored in the OLAP part. 

Figure 4 illustrates how the information about the places of interest is repre- 
sented in our model. In this figure, in the OLAP part there is a virtual dimension, 
which we call the Generic Pol, that will be instantiated by as many types of 
places of interest as a particular application requires (in the figure, we show an 
instantiation for hotels). The bottom level of this dimension is denoted Polb- 
There is also a function that maps the bottom level of the instances of the Generic 
Pol (GPoI) to geometries in the geometric part, in the layer corresponding to 
the Pols, denoted Lpoi- In Figure 4, hotelld is mapped to the geometry Polygon 
in the layer Lpoi- The minimum duration of a Pol is stored as an attribute of 
the bottom level of the instances of the GPoI. For example, an attribute of level 
hotelld in Figure 4. At the instance level, analogous to what we explained in 
Section 2, the function ck^'^'^'/) maps elements in the bottom level [Pi) of the 
instances of the GPoI, to the geometric identifiers of the places of interest in the 
geometric part (in Figure 4, the function is defined as Q^i°*p'^'f/o^e°!f ^°")- 

Definition 6 (SM-MOFT). Let = {Ci = {Rc„ Ac,), ...,Cn = (ii'c„,^c«)} 
be a PIA of Pols and let M be a MOFT. The SM-MOFT X"" of M with 
respect to V consist of the tuples (Oid, gid,ts,tf) such that (a) Oid is the iden- 
tifier of a trajectory in Ai; (b) gid is the identifier of the geometry of a Pol 
Ck = {Rc'kJ ^Cfc) of V such that the trajectory with identifier Oid hi A4 has a 
stop in this Pol during the time interval [ts,tf]. This interval is called the stop 
interval of this stop. □ 

The table in Figure 5 (left) gives an example of a SM-MOFT. The following 
property shows that SM-MOFTs can be defined in the moving object query 
language Cmo- 

Proposition 3. There is an Cmo formula (j)sm{OidT gidits,tf) that defines the 
SM-MOFT M"'"" of M with respect to P. □ 

We omit the proof of this property but remark that the use of the formula 
4'sm(Pid,gid,ts,tf) allows us to speak about stops and moves of trajectories in 
Cmo- We can therefore add predicates to define stops and moves of trajectories 
as syntactic sugar to Cmo- 
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OLAP pan 



Fig. 4. Adding Stops to the Data Model 



6 A Query Language for Moving Objects 

In this section we will show how the language Cmo and the model supporting it, 
can yield sub-languages that can address many interesting aggregation queries 
for moving objets in a GIS environment. We will sketch a query language based 
on path regular expressions, along the lines proposed by [13]. However, our lan- 
guage goes far beyond, taking advantage of the integration between GIS, OLAP 
and moving objects provided with our model. Moreover, queries that do not re- 
quire access to the MOFT can be evaluated very efficiently, making use of the 
SM-MOFT. 

The idea is based on the construction of a graph representing the stops and 
moves of a single trajectory as follows: from the SM-MOFT Jvl^"^ we construct a 
graph G as follows. For each different gid in A^'*™, there is a node v in G, denoted 
v{gid), which is assigned a unique node number n. Further, there is an edge m 
in G between two nodes v{gid-^) and v{gid2), for every pair of ti, t2 of consecutive 
tuples in A^'*'" with the same Oid- Each node v is augmented with two functions 
and one set: (a) the function extent{v) returns the identifier pid of the Pol in 
the OLAP part of the model (i.e., the pid such that a^^^'^^iiPid) = Qid)', (b) 
the function label{v) returns the dimension in the OLAP part to which a given 
Pol Pi belongs (v.g. Hotels, Museums, and so on); (c) a set of Stop Intervals 
(technically a temporal element) STE(v), containing the stop intervals of the 
object at v. Note that an object may be at a stop more than one time within a 
trajectory. Further, these is an ordered set, given that the intervals are disjoint 
by definition and consecutive by construction. We denote the graph constructed 
in this way an SM-Graph. 

Example 3. Let us consider the SM-MOFT table M'"" based on the SM-MOFT 
of Figure 5 (left). We will use the SM-Graph for the trajectory such that Oid = 
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10 


Oi 
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20 


30 


Oi 


Hi 


100 


140 


02 


H2 
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02 
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25 


40 


02 
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50 


80 


02 


H2 


120 


140 


03 


H2 





10 


03 


E 


10 


40 


03 


H2 


60 


140 



label(l) = Hotel, extension(l)= HI 
STE(1) = |[0,1][120,I40]| 




label(3) - Turist attraction. extension(3)- E 
STE(3) = I [50,8011 



2 label(2) = Museum, extension(2)= L 
STE(2) = 1 125,4011 

Fig. 5. An SM-MOFT for the running example (left); An SM-Graph (right) for this 
table. 



02, obtained as crOid=02{-^^"^)- Also, we will denote in our examples, Hi, Mi, 
and Ti, the instances of hotels, museums and tourist attractions, respectively. 
Figure 5 (right) shows the SM-Graph. □ 

We will also need some operators on time intervals. We say that an interval 
Ii = [ti,t2] strictly precedes I2 = [^3,^4], denoted /i </2, if ti < t2 < ^3 < ^4- We 
also say that t <l [ti, 12] returns True if <t<t2- Note that all stop intervals 
/i, /2 of the same trajectory are such that either Ii < I2 or I2 < h- 

Now we are ready to define a simplified query language for moving object 
aggregation, taking advantage of the concept of stops and moves, but powerful 
enough to combine (to some extent) the notion of regular path expressions and 
first order constraints. We assume that MOFTs are well-defined, thus the graphs 
are temporally consistent. In addition, each edge in an SM-Graph is univocally 
defined by the intervals of the stop temporal elements of the beginning and 
ending nodes of the edge. In other words, if there exist two edges from a node 
Vi to a node V2- Each node must have associated two stop temporal intervals, 
STE{vi) = {Ii,h} and STE{v2) = {h, h}, where h < h < h < h holds. 

A first observation at the definition of the <SA^-Graph Q reveals that the 
graph can be seen as a DFA accepting regular expressions over the labels of the 
nodes in the graph. This becomes clear if, in the graph of Figure 5 (right) we 
replace v by label{v). In this case, the nodes labeled Mi, and Hi will become 
M, and H, respectively (shorthand for Museums and Hotels. We call this graph 
ASM-Graph (the A stands for aggregation). As a second observation, we can 
think on a language such that the DFA accepting this language is contained in 
the ASM-Graph. Thus, a trajectory satisfies a query Q if the DFA of the query 
is contained in G. 

Definition 7 (Regular Expressions Language for Stops and Moves). 

An regular expression on stops and moves, denoted RESM is an expression 
generated by the grammar 

E< — dim\ dim[cond] \ (E)* \ E.E \ e |? 

where dim G Z) (a set of dimension names in the OLAP part), e is the symbol 
representing the empty expression, "." means concatenation, and cond represents 
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a condition over Cat ■ The term "?" is a wildcard meaning "any sequence of any 
number of dim" . □ 

The aggregate language is built on top of RESMs: for each trajectory T in an 
SM-MOFT such that there is a sub-trajectory of T that matches the RESM, the 
query returns the Oid of T. Then, we can apply the aggregate function Count 
to the set returned. 

We explain the semantics of RESM-bascd language using the query: ''total 
number of trajectories that went from a "Hilton" hotel to a tourist attraction, 
stopping at a museum. whose RESM reads: CouNT(H[namc='Hilton'].?.M.?.T). 

Note that "name" is an attribute of the Pol identifier pid in the OLAP part 
(an attribute of the extension of the node). Then, for each trajectory, and for 
each instantiation with a value H, M or T, of a node in the graph, the variable 
name is instantiated with the value Vi corresponding to the attribute name 
Pid in the OLAP part such that extension{v) .name = Vi in the dimension 
D = Hotel. The condition on the node is then checked. Finally, if there is a 
sub-trajectory matching the RESM, then its Oid counts for the aggregation. 

As another example, the query "total number of trajectories that went from 
a Hilton hotel to the Louvre, in the morning. " 

CouNT(H[name='Hilton'].?.M[name='Louvre' A3 t < ^^JU^eId^T^meO^Day ^ 
'^morning" ]) 

The semantics of the first condition is analogous to the semantics of the query 
above. The same occurs with the condition over name in M. For the last part 
of the condition over M, for each trajectory, and each instantiation of a node in 
the graph with a value H or M, / is instantiated with values of STE{v). 

Proposition 4. The language defined above is a subset of Cmo- D 

Proof. (Sketch) The proof is built on the property that, for each trajectory in 
an SM-MOFT the SM-Graph can be unfolded, and transformed into a sequence 
of nodes, given that for all nodes v in the graph, all intervals in STE{v) are 
disjoint. Thus, this sequence can then be queried using any FO language with 
time variables, like Cmo n 

7 Future Work 

Our future work will be focused in the implementation of the model and query 
languages proposed here, and its integration with the framework introduced 
in [6]. We also believe that the RESM language is promising for mining trajectory 
data, specifically in the context of sequential patterns mining with constraints, 
and we will work in this direction. 
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